A Comparative Study of Clustering Algorithms on Narrow - Domain
نویسندگان
چکیده
Clustering abstracts of scientific texts of very narrow domain implies a big challenge. The first problem to attend is the high overlapping among the document’s vocabularies, besides the low frequency of these terms. The transition point technique has been successfully used in this area of Natural Language Processing (NLP). Its best properties rely on the extraction of the mid-frequency terms. Although the importance of these terms on NLP has been known from time ago, the exact extraction of these terms is unknown. In this paper we present an application of this technique as a feature selection technique in two corpora of very narrow domain. The experimental results show that the transition point technique obtains the best results of F-measure with five different clustering methods.
منابع مشابه
A Comparative Study of Some Clustering Algorithms on Shape Data
Recently, some statistical studies have been done using the shape data. One of these studies is clustering shape data, which is the main topic of this paper. We are going to study some clustering algorithms on shape data and then introduce the best algorithm based on accuracy, speed, and scalability criteria. In addition, we propose a method for representing the shape data that facilitates and ...
متن کاملA Comparative Study between a Pseudo-Forward Equation (PFE) and Intelligence Methods for the Characterization of the North Sea Reservoir
This paper presents a comparative study between three versions of adaptive neuro-fuzzy inference system (ANFIS) algorithms and a pseudo-forward equation (PFE) to characterize the North Sea reservoir (F3 block) based on seismic data. According to the statistical studies, four attributes (energy, envelope, spectral decomposition and similarity) are known to be useful as fundamental attributes in ...
متن کاملImproving Vehicular Ad-Hoc Network Stability Using Meta-Heuristic Algorithms
Vehicular ad-hoc network (VANET) is an important component of intelligent transportation systems, in which vehicles are equipped with on-board computing and communication devices which enable vehicle-to-vehicle communication. Consequently, with regard to larger communication due to the greater number of vehicles, stability of connectivity would be a challenging problem. Clustering technique as ...
متن کاملA Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset
Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کامل